|
Malicious code classification algorithm based on multi-feature fusion
LANG Dapeng, DING Wei, JIANG Haocheng, CHEN Zhiyuang
Journal of Computer Applications
2019, 39 (8):
2333-2338.
DOI: 10.11772/j.issn.1001-9081.2019010116
Concerning the fact that most malicious code classification researches are based on family classification and malicious and benign code classification, and the classification of categories is relatively few, a malicious code classification algorithm based on multi-feature fusion was proposed. Three sets of features extracted from texture maps and disassembly files were used for fusion classification research. Firstly, the gray level co-occurrence matrix features were extracted from source files and disassembly files and the sequences of operation codes were extracted by
n-gram algorithm. Secondly, the improved Information Gain (IG) algorithm was used to extract the operation code features. Thirdly, Random Forest (RF) was used as the classifier to learn the multi-group features after normalization. Finally, the random forest classifier based on multi-feature fusion was realized. The proposed algorithm achieves 85% accuracy by learning and testing nine types of malicious codes. Compared with random forest under single feature, multi-layer perceptron under multi-feature and Logistic regression classifier, it has higher accuracy.
Reference |
Related Articles |
Metrics
|
|